2,181 research outputs found

    Generalized Yule-Walker Estimation for Spatio-Temporal Models with Unknown Diagonal Coefficients

    Get PDF
    We consider a class of spatio-temporal models which extend popular econometric spatial autoregressive panel data models by allowing the scalar coefficients for each location (or panel) different from each other. To overcome the innate endogeneity, we propose a generalized Yule-Walker estimation method which applies the least squares estimation to a Yule-Walker equation. The asymptotic theory is developed under the setting that both the sample size and the number of locations (or panels) tend to infinity under a general setting for stationary and alpha-mixing processes, which includes spatial autoregressive panel data models driven by i.i.d. innovations as special cases. The proposed methods are illustrated using both simulated and real data

    Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation

    Full text link
    Fine-grained, span-level human evaluation has emerged as a reliable and robust method for evaluating text generation tasks such as summarization, simplification, machine translation and news generation, and the derived annotations have been useful for training automatic metrics and improving language models. However, existing annotation tools implemented for these evaluation frameworks lack the adaptability to be extended to different domains or languages, or modify annotation settings according to user needs. And the absence of a unified annotated data format inhibits the research in multi-task learning. In this paper, we introduce Thresh, a unified, customizable and deployable platform for fine-grained evaluation. By simply creating a YAML configuration file, users can build and test an annotation interface for any framework within minutes -- all in one web browser window. To facilitate collaboration and sharing, Thresh provides a community hub that hosts a collection of fine-grained frameworks and corresponding annotations made and collected by the community, covering a wide range of NLP tasks. For deployment, Thresh offers multiple options for any scale of annotation projects from small manual inspections to large crowdsourcing ones. Additionally, we introduce a Python library to streamline the entire process from typology design and deployment to annotation processing. Thresh is publicly accessible at https://thresh.tools

    Improving Large-scale Paraphrase Acquisition and Generation

    Full text link
    This paper addresses the quality issues in existing Twitter-based paraphrase datasets, and discusses the necessity of using two separate definitions of paraphrase for identification and generation tasks. We present a new Multi-Topic Paraphrase in Twitter (MultiPIT) corpus that consists of a total of 130k sentence pairs with crowdsoursing (MultiPIT_crowd) and expert (MultiPIT_expert) annotations using two different paraphrase definitions for paraphrase identification, in addition to a multi-reference test set (MultiPIT_NMR) and a large automatically constructed training set (MultiPIT_Auto) for paraphrase generation. With improved data annotation quality and task-specific paraphrase definition, the best pre-trained language model fine-tuned on our dataset achieves the state-of-the-art performance of 84.2 F1 for automatic paraphrase identification. Furthermore, our empirical results also demonstrate that the paraphrase generation models trained on MultiPIT_Auto generate more diverse and high-quality paraphrases compared to their counterparts fine-tuned on other corpora such as Quora, MSCOCO, and ParaNMT.Comment: The project webpage is at http://twitter-paraphrase.com/ Accepted at EMNLP 202

    Mechanical properties improvement of ground Tire Rubber/Thermoplastic composites produced by rotational molding

    Get PDF
    Dans ce travail, des composites à base de caoutchouc de pneus moulus (GTR)/résines thermoplastiques ont été produits avec succès en combinant une technique de mélange à sec avec un procédé de rotomoulage. Afin d'améliorer les propriétés mécaniques des composites résultants, certaines méthodes de modification ont été utilisées. À partir des composites rotomoulés, un ensemble complet de caractérisation comprenant les propriétés morphologiques, physiques (masse volumique et dureté) et mécaniques (traction, flexion et impact) a été réalisé. La première partie du travail a étudié l'effet de l'incorporation d'agents gonflants chimiques, de fibres de bois d'érable et de deux traitements de surface du GTR (modifié par le polyéthylène maléaté (MAPE) en solution et traité par irradiation micro-ondes) sur les propriétés mécaniques des composites GTR/polyéthylène linéaire de basse densité (LLDPE) produits par rotomoulage. La deuxième partie du travail a étudié l'effet du GTR traité au MAPE sur les propriétés mécaniques des composites GTR/polypropylène (PP) préparés par rotomoulage. Les propriétés mécaniques ont indiqué que le GTR traité par MAPE, parmi ces méthodes de modification, était une approche efficace pour améliorer la compatibilité et l'adhésion interfaciale des composites GTR/thermoplastiques. Par exemple, la résistance à l’impact du LLDPE/GTR (85/15) a montré une amélioration de 30% avec l’addition de 0.3% en poids de MAPE comparé au composite avec la même concentration de GTR sans traitement au MAPE. Aussi, e une augmentation de 52% de la résistance en impact pour le composite PP/GTR (50/50) a été obtenu avec l’introduction de 2% en poids de MAPE en comparaison avec une teneur similaire de GTR sans traitement au MAPE.In this work, ground tire rubber (GTR)/thermoplastic composites were successfully produced by combining a dry-blending technique with a rotational molding process. In order to improve the mechanical properties of the resulting composites, different modification methods were used. From the rotomolded composites produced, a complete set of characterization including morphological, physical (density and hardness) and mechanical properties (tensile, flexural and impact) was performed. The first part of the work investigated the effect of chemical blowing agent and maple wood fibers concentration, as well as two GTR surface treatments (maleated polyethylene (MAPE) in solution and microwave irradiation) on the mechanical properties of GTR/linear low density polyethene (LLDPE) composites. The second part of the work studied the effect of MAPE treated GTR on the mechanical properties of GTR/polypropylene (PP) composites. Overall, the results showed that MAPE treated GTR was an effective approach for improving the compatibility and interfacial adhesion between GTR and thermoplastic composites. For example, the impact strength of LLDPE/GTR (85/15) composite reached a 30% improvement by adding 0.3 wt.% MAPE above that of the same GTR content without MAPE treatment. A 52% improvement of impact strength for PP/GTR (50/50) by introducing 2 wt.% MAPE was obtained compared to the composite with the same content of untreated GTR

    Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

    Full text link
    Large language models (e.g., GPT-4) are uniquely capable of producing highly rated text simplification, yet current human evaluation methods fail to provide a clear understanding of systems' specific strengths and weaknesses. To address this limitation, we introduce SALSA, an edit-based human annotation framework that enables holistic and fine-grained text simplification evaluation. We develop twenty one linguistically grounded edit types, covering the full spectrum of success and failure across dimensions of conceptual, syntactic and lexical simplicity. Using SALSA, we collect 19K edit annotations on 840 simplifications, revealing discrepancies in the distribution of simplification strategies performed by fine-tuned models, prompted LLMs and humans, and find GPT-3.5 performs more quality edits than humans, but still exhibits frequent errors. Using our fine-grained annotations, we develop LENS-SALSA, a reference-free automatic simplification metric, trained to predict sentence- and word-level quality simultaneously. Additionally, we introduce word-level quality estimation for simplification and report promising baseline results. Our data, new metric, and annotation toolkit are available at https://salsa-eval.com.Comment: Accepted to EMNLP 202

    MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations

    Full text link
    We study conversational dialog in which there are many possible responses to a given history. We present the MultiTalk Dataset, a corpus of over 320,000 sentences of written conversational dialog that balances a high branching factor (10) with several conversation turns (6) through selective branch continuation. We make multiple contributions to study dialog generation in the highly branching setting. In order to evaluate a diverse set of generations, we propose a simple scoring algorithm, based on bipartite graph matching, to optimally incorporate a set of diverse references. We study multiple language generation tasks at different levels of predictive conversation depth, using textual attributes induced automatically from pretrained classifiers. Our culminating task is a challenging theory of mind problem, a controllable generation task which requires reasoning about the expected reaction of the listener.Comment: 7 pages, AAAI-2

    Automatic and Human-AI Interactive Text Generation

    Full text link
    In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or linguistic styles), while largely retaining the original meaning and the length of the text. This includes many useful applications, such as text simplification, paraphrase generation, style transfer, etc. In contrast to text summarization and open-ended text completion (e.g., story), the text-to-text generation tasks we discuss in this tutorial are more constrained in terms of semantic consistency and targeted language styles. This level of control makes these tasks ideal testbeds for studying the ability of models to generate text that is both semantically adequate and stylistically appropriate. Moreover, these tasks are interesting from a technical standpoint, as they require complex combinations of lexical and syntactical transformations, stylistic control, and adherence to factual knowledge, -- all at once. With a special focus on text simplification and revision, this tutorial aims to provide an overview of the state-of-the-art natural language generation research from four major aspects -- Data, Models, Human-AI Collaboration, and Evaluation -- and to discuss and showcase a few significant and recent advances: (1) the use of non-retrogressive approaches; (2) the shift from fine-tuning to prompting with large language models; (3) the development of new learnable metric and fine-grained human evaluation framework; (4) a growing body of studies and datasets on non-English languages; (5) the rise of HCI+NLP+Accessibility interdisciplinary research to create real-world writing assistant systems.Comment: To appear at ACL 2024, Tutoria

    LENS: A Learnable Evaluation Metric for Text Simplification

    Full text link
    Training learnable metrics using modern language models has recently emerged as a promising method for the automatic evaluation of machine translation. However, existing human evaluation datasets for text simplification have limited annotations that are based on unitary or outdated models, making them unsuitable for this approach. To address these issues, we introduce the SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on 2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging simplification benchmark consisting of over 1K human ratings of 360 simplifications including GPT-3.5 generated text. Training on SimpEval, we present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive empirical results show that LENS correlates much better with human judgment than existing metrics, paving the way for future progress in the evaluation of text simplification. We also introduce Rank and Rate, a human evaluation framework that rates simplifications from several models in a list-wise manner using an interactive interface, which ensures both consistency and accuracy in the evaluation process and is used to create the SimpEval datasets.Comment: Accepted at ACL 202
    • …
    corecore